67 research outputs found

    BAC: A bagged associative classifier for big data frameworks

    Get PDF
    Big Data frameworks allow powerful distributed computations extending the results achievable on a single machine. In this work, we present a novel distributed associative classifier, named BAC, based on ensemble techniques. Ensembles are a popular approach that builds several models on different subsets of the original dataset, eventually voting to provide a unique classification outcome. Experiments on Apache Spark and preliminary results showed the capability of the proposed ensemble classifier to obtain a quality comparable with the single-machine version on popular real-world datasets, and overcome their scalability limits on large synthetic datasets

    Explaining deep convolutional models by measuring the influence of interpretable features in image classification

    Get PDF
    The accuracy and flexibility of Deep Convolutional Neural Networks (DCNNs) have been highly validated over the past years. However, their intrinsic opaqueness is still affecting their reliability and limiting their application in critical production systems, where the black-box behavior is difficult to be accepted. This work proposes EBANO, an innovative explanation framework able to analyze the decision-making process of DCNNs in image classification by providing prediction-local and class-based model-wise explanations through the unsupervised mining of knowledge contained in multiple convolutional layers. EBANO provides detailed visual and numerical explanations thanks to two specific indexes that measure the features’ influence and their influence precision in the decision-making process. The framework has been experimentally evaluated, both quantitatively and qualitatively, by (i) analyzing its explanations with four state-of-the-art DCNN architectures, (ii) comparing its results with three state-of-the-art explanation strategies and (iii) assessing its effectiveness and easiness of understanding through human judgment, by means of an online survey. EBANO has been released as open-source code and it is freely available online

    Network Digest analysis by means of association rules

    Get PDF
    The continuous growth in connection speed allows huge amounts of data to be transferred through a network. An important issue in this context is network traffic analysis to profile communications and detect security threats. Association rule extraction is a widely used exploratory technique which has been exploited in different contexts (e.g., network traffic characterization). However, to discover (potentially relevant) knowledge a very low support threshold needs to be enforced hence generating a large number of unmanageable rules. To address this issue in network traffic analysis, an efficient technique to reduce traffic volume is needed. This paper presents a NEtwork Digest framework, which performs network traffic analysis by means of data mining techniques to characterize traffic data and detect anomalies. NED exploits continuous queries to efficiently perform realtime aggregation of captured network data and supports filtering operations to further reduce traffic volume focusing on relevant data. Furthermore, NED provides an efficient algorithm to perform refinement analysis by means of association rules to discover traffic features. Extracted rules allow traffic data characterization in terms of correlation and recurrence of feature patterns. Preliminary experimental results performed on different network dumps showed the efficiency and effectiveness of the NED framework to characterize traffic data

    Automating concept-drift detection by self-evaluating predictive model degradation

    Get PDF
    A key aspect of automating predictive machine learning entails the capability of properly triggering the update of the trained model. To this aim, suitable automatic solutions to self-assess the prediction quality and the data distribution drift between the original training set and the new data have to be devised. In this paper, we propose a novel methodology to automatically detect prediction-quality degradation of machine learning models due to class-based concept drift, i.e., when new data contains samples that do not fit the set of class labels known by the currently-trained predictive model. Experiments on synthetic and real-world public datasets show the effectiveness of the proposed methodology in automatically detecting and describing concept drift caused by changes in the class-label data distributions.Comment: 5 pages, 4 figure

    Improving Wildfire Severity Classification of Deep Learning U-Nets from Satellite Images

    Get PDF
    Uncontrolled wildfires are dangerous events capable of harming people safety. To contrast their increasing impact in recent years, a key task is an accurate detection of the affected areas and their damage assessment from satellite images. Current state-of-the-art solutions address such problem through a double convolutional neural network able to automatically detect wildfires in satellite acquisitions and associate a damage index from a defined scale. However, such deep-learning model performance is strongly dependent on many factors. In this work, we specifically focus on a key parameter, i.e., the loss function, exploited in the underlying neural networks. Besides the state-of-the-art solutions based on the Dice-MSE, among the many loss functions proposed in literature, we focus on the Binary Cross-Entropy (BCE) and the Intersection over Union (IoU), as two representatives of the distribution-based and region-based categories, respectively. Experiments show that the BCE loss function coupled with a double-step U-Net architecture provides better results than current state-of-the-art solutions on a public labeled dataset of European wildfires

    Exploring waste-collection fleet data: challenges in a real-world use case from multiple data providers

    Get PDF
    In the age of connected vehicles, large amounts of data can be collected while driving through a variety of on-board sensors. The information collected can be used for various types of data-driven analytics that can be of great benefit to both vehicle owners, e.g., to reduce costs by means of predictive maintenance, and to society as a whole, e.g., to optimize mobility behavior. Prior to any real-world data analysis, an investigation and characterization of the available data is of utmost importance in order to evaluate the quality and quantity of the data and to set the right expectations. In this paper, we focus on the data exploration and characterization step, which is necessary to avoid inconsistencies in the collected parameters and to enable valid, data-driven modeling. The proposed data exploration considers both the frequency of samples and their values for all monitored parameters. A specific cross-provider data comparison is performed to compare values collected for the same vehicle at the same time from different fleet monitoring data providers. The study is applied to a real-world use case with months of data from dozens of vehicles deployed in the waste collection service managed by SEA, Soluzioni Eco Ambientali, in Italy. The analyzes uncover unexpected behaviors in the measurements and lead to their early identification, bringing great benefits to the company operating the fleet by improving data collection and enabling a safe modeling phase

    Double-Step deep learning framework to improve wildfire severity classification

    Get PDF
    Wildfires are dangerous events which cause huge losses under natural, humanitarian and economical perspectives. To contrast their impact, a fast and accurate restoration can be improved through the automatic census of the event in terms of (i) delin- eation of the affected areas and (ii) estimation of damage severity, using satellite images. This work proposes to extend the state- of-the-art approach, named Double-Step U-Net (DS-UNet), able to automatically detect wildfires in satellite acquisitions and to associate a damage index from a defined scale. As a deep learning network, the DS-UNet model performance is strongly dependent on many factors. We propose to focus on alternatives in its main architecture by designing a configurable Double-Step Framework, which allows inspecting the prediction quality with different loss-functions and convolutional neural networks used as backbones. Experimental results show that the proposed framework yields better performance with up to 6.1% lower RMSE than current state of the art

    SeLINA: a Self-Learning Insightful Network Analyzer

    Get PDF
    Understanding the behavior of a network from a large scale traffic dataset is a challenging problem. Big data frameworks offer scalable algorithms to extract information from raw data, but often require a sophisticated fine-tuning and a detailed knowledge of machine learning algorithms. To streamline this process, we propose SeLINA (Self-Learning Insightful Network Analyzer), a generic, self-tuning, simple tool to extract knowledge from network traffic measurements. SeLINA includes different data analytics techniques providing self-learning capabilities to state-of-the-art scalable approaches, jointly with parameter auto-selection to off-load the network expert from parameter tuning. We combine both unsupervised and supervised approaches to mine data with a scalable approach. SeLINA embeds mechanisms to check if the new data fits the model, to detect possible changes in the traffic, and to, possibly automatically, trigger model rebuilding. The result is a system that offers human-readable models of the data with minimal user intervention, supporting domain experts in extracting actionable knowledge and highlighting possibly meaningful interpretations. SeLINA's current implementation runs on Apache Spark. We tested it on large collections of realworld passive network measurements from a nationwide ISP, investigating YouTube and P2P traffic. The experimental results confirmed the ability of SeLINA to provide insights and detect changes in the data that suggest further analyse
    • …
    corecore